May 29, 2025

Intense longitudinal data (ILD)

Uniquely suited to provide detailed information on psychological and behavioral processes over time at the within-person level

  • Daily Diary
  • Ecological momentary assessment (EMA)
  • Experience sampling method (ESM)
  • Ambulatory assessments (AA)
  • Sensors

publications using ILD increased from ~250 to >2.500 between 2000 and 2023 (PubMed)



Analyzing ILD data

Most popular model: (multilevel) vector autoregressive (VAR) model

VAR model: stable process in which a system is perturbed by outside influences.

We study how quickly the process returns to the single equilibrium point (e.g., the mean).

Example: emotion regulation.


McNeish & Hamaker, 2019.

Analyzing ILD data

But, what if the process switches between multiple phases, associated with different experiences?

Conceptual framework: e.g., dynamical systems that switch between a number of equilibrium points

Example: depression, bipolar disorder, … .


Helmich et al., 2024

Analyzing ILD data

Empirical data underline \(\rightarrow\) a single equilibrium point does not always provide appropriate reflection of the data:

  • multimodality detected in weekly symptom data (Hosenfeld et al., 2015).
  • multimodality and skewness frequently detected in EMA emotion measurements
    • Haslbeck et al., 2022: 14%–40% across studies exhibited a unimodal symmetric distribution.

Theory driven preference for HMMs

Advantages of modelling a process that switches between multiple (latent) categorical states:

  • quantify the dynamics over time
    • probability of staying in same mood state
      • e.g., high durations of a depressed mood state
    • if one exits a mood state, which mood state is the individual likely to switch to next?
  • empirically derived composition of (mood) states
  • derive the most likely sequence of (mood) states over time for each individual




Empirical example



Capturing (suicidal) crisis dynamics

Capturing crisis dynamics using ESM data


  • 26 patients, 60 observations per patient per CAB factor

Capturing crisis dynamics using ESM data



Theory driven preference for HMMs

In the empirical example on suicidal crisis states:

  • How long does someone tend to stay in a low or high suicidal crisis state?
  • If one exits a low suicicidal crisis state, does someone tend to switch directly to the highest suicidal crisis state, or (first) to an intermediate crisis state?
  • What is the composition of the empirically derived suicidal crisis states?
    • e.g., what makes a state a high crisis state or a low crisis state?
  • What is the sequence of suicidal crisis states over time for each individual?

The hidden Markov model

HMM: probabilistic model to infer hidden states \(S_t \in (1, 2, ..., m)\) at

  • each time point \(t = 1, 2, ..., T\)
  • from the observed series \((y_{k1}, y_{k2}, ..., y_{kT})\)
  • for dependent variable \(k = 1, 2, ..., K\)

The hidden Markov model

We assume:

  • observation \(y_{kt}\) is generated by an underlying, latent state \(S_t\),
  • sequence of hidden states \(\{S_t : t = 1, 2, ..., T\}\) forms a Markov chain: probability of switching to the next state \(S_{t+1}\) only depends on the current state \(S_t\): \(P(S_{t_1} | S_t, S_{t-1}, ..., S_1 = P(S_{t+1} | S_t)\)

The hidden Markov model

Two sets of parameters:

  1. probability of transitioning between latent states \(\gamma_{ij}=P(S_{t+1} = j | S_t = i)\)
  2. probability of emitting an observation given the current latent state \(P(y_{kt} | S_t)\)

The hidden Markov model

Given example data, assume

  • Normal emission distribution: \(P(y_{kt} | S_t) \sim N(\mu_{ki}, \sigma_{ki}^2)\)
  • \(y_{k.}\) are conditionally independent given \(\{S_t : t = 1, 2, ..., T\}\) to accommodate multivariate data

Conventional HMM: desinged for single sequence

Analyzing data from multiple individuals

  • A: strong assumption: individuals do not differ with respect to process dynamics
  • A: explain some of the differences using covariates (e.g., R package depmixS4)
  • B: composition of states not related over individuals, inefficient use of data
  • A, B, and C do not allow for:
    • (quantification of) natural variation between individuals
    • individual specific dynamics over time


Extending the HMM to a multilevel framework

Including a multilevel framework for the HMM, enables simultaneous estimation:

  • group level parameter estimates
  • subject specific parameter estimates for each subject \(n\)

Multinomial logistic regression to estimate transition probabilities \(\gamma_{nij}\)

\(\gamma_{nij} = \frac{\text{exp}(\alpha_{nij})}{1 + \sum_{{j} = 2}^m \text{exp}(\alpha_{ni{j}})} \quad\) where \(\quad \alpha_{nij} = \bar{\alpha}_{ij} + \epsilon_{\left[\alpha\right]nij}\)

In addition, for the variable & state dependent emission Normal means:

\(\quad {\mu_{nki} = \bar{\mu}_{ki} + \epsilon_{\left[\mu\right]nki}}\)

Adopt a Bayesian approach

Four CAB based crisis states



CAB crisis state dynamics

CAB crisis state dynamics

CAB crisis state dynamics

CAB crisis state dynamics

Personalized crisis trajectories


Capturing crisis dynamics – conclusion

Pairing fine grained ESM data with multilevel HMM, we

  • Identified four distinct and ascending CAB based crisis states
  • Quantified and visualized the pattern of crisis at the group and patient individual level
  • Observed considerable variation between patients, both in remaining within and transitioning between CAB crisis states
  • Highlighted the need for a personalized method




mHMMbayes



R package for multilevel hidden Markov models using Bayesian estimation

mHMMbayes

  • mHMMbayes fits multilevel hidden Markov models using Bayesian estimation
  • Allows for heterogeneity between subjects, while estimating one overall HMM
  • The model accommodates
    • continuous, categorical and count multivariate data
    • individual level covariates
    • missing data in the dependent variable(s)
  • includes functions for simulating data, obtaining the most likely hidden state sequence (Viterbi algorithm), and automated plotting.

mHMMbayes - Bayesian estimation

Advantages:

  • Bayesian methods yield many additional metrics, such as standard errors and credibility intervals.
  • Relatively easy to handle missing data (assuming MAR).
  • Allows incorporation of prior knowledge (through prior distribution), accommodates smaller sample sizes.
  • Adds flexibility, making it relatively easy to extent to more complex models.

mHMMbayes - Bayesian estimation

Disadvantages:

  • Label switching problem (more prominent in continuous and count data).
  • Solution:
    • sensible starting values
    • weakly informative priors
    • (hyper-prior values for) \(\bar{\mu}_i\) or \(\bar{\lambda}_i\) ordered in a sensible manner given data and process.

The mHMMbayes package: an example


26 patients, 60 observations per patient per CAB factor

Fitting a multilevel HMM – input

Fitting a 4 state multilevel hidden Markov model on the CAB crisis data:

library(mHMMbayes)

# Run a model without covariate(s) and default prior for gamma:
out_4st <- mHMM(s_data          = CAB_mHMM,
                data_distr      = "continuous",
                gen             = list(...),
                start_val       = list(...),
                emiss_hyp_prior = ...,
                mcmc            = list(J = J, burn_in = burn_in))

Fitting a multilevel HMM – general properties

## specifying general model properties:
 # number of states m
m <- 4
 # number of dependent variables
n_dep <- 5

gen <- list(m = m, n_dep = n_dep)

Fitting a multilevel HMM – starting values

Needed for the first iteration of the ‘forward algorithm’ \(\rightarrow\) determining the probability of each state for each point in time for each individual

Can be based on:

  • data driven approach \(\rightarrow\) fitting a single HMM on the completely pooled data (e.g., using DepmixS4).
  • theory driven \(\rightarrow\) given the process you are studying, the variables collected, and the scale of the variables, what are sensible
    • means (and SDs) for each of the states (e.g., a ‘good’ and ‘bad’ state),
    • transition probabilities.
  • combination \(\rightarrow\) combine theoretical considerations with information from (visual) exploration of the data.

Fitting a multilevel HMM – starting values

Starting values for the transition probability matrix

  • values on the diagonal represent ‘self-transitions’
  • off-diagonal values represent the probabilities to switch between states
  • usually, the diagonal is composed of larger values than the off-diagonal
  • each row needs to sum to 1
start_tpm <- matrix(c(..., ..., ..., ...,
                      ..., ..., ..., ...,
                      ..., ..., ..., ...,
                      ..., ..., ..., ...), byrow = TRUE, ncol = m, nrow = m)

Fitting a multilevel HMM – starting values

Starting values for the transition probability matrix

  • values on the diagonal represent ‘self-transitions’
  • off-diagonal values represent the probabilities to switch between states
  • usually, the diagonal is composed of larger values than the off-diagonal
  • each row needs to sum to 1
start_tpm <- matrix(c(0.70, 0.10, 0.10, 0.10,
                      0.10, 0.70, 0.10, 0.10,
                      0.10, 0.10, 0.70, 0.10,
                      0.10, 0.10, 0.10, 0.70), byrow = TRUE, ncol = m, nrow = m)
start_tpm <- matrix(c(0.91, 0.03, 0.03, 0.03,
                      0.03, 0.91, 0.03, 0.03,
                      0.03, 0.03, 0.91, 0.03,
                      0.03, 0.03, 0.03, 0.91), byrow = TRUE, ncol = m, nrow = m)

Fitting a multilevel HMM – starting values

Starting values for the (Gaussian) emission distribution

  • what are likely observations for each of the states (e.g., for a low and high crisis state)?
  • means and SDs need to be specified such that all observations within a variable receive reasonable support.
  • take negative correlations between variables into account!
    • E.g., high self-control likely goes together with low suicidal ideation and vice versa.
  • list with one matrix per dependent variable, with 2 columns – mean and SD – and m rows.

Fitting a multilevel HMM – starting values

Starting values for the (Gaussian) emission distribution

start_emission <- list(matrix(c(..., ..., 
                                ..., ...,
                                ..., ..., 
                                ..., ...), byrow = TRUE, ncol = 2, nrow = m), # self-control
                       matrix(c(..., ..., 
                                ..., ...,
                                ..., ..., 
                                ..., ...), byrow = TRUE, ncol = 2, nrow = m), # negative-affect
                       matrix(c(..., ..., 
                                ..., ...,
                                ..., ..., 
                                ..., ...), byrow = TRUE, ncol = 2, nrow = m), # contact avoidance
                       ...

Fitting a multilevel HMM – starting values

Starting values for the (Gaussian) emission distribution


Fitting a multilevel HMM – starting values

Starting values for the (Gaussian) emission distribution

start_emission <- list(matrix(c(60, 15, 
                                40, 10,
                                20, 10, 
                                 3, 2), byrow = TRUE, ncol = 2, nrow = m), # self-control
                       matrix(c(20, 10, 
                                45, 10,
                                65, 10, 
                                82, 10), byrow = TRUE, ncol = 2, nrow = m), # negative-affect
                       matrix(c( 3, 2, 
                                15, 5,
                                40, 10, 
                                75, 10), byrow = TRUE, ncol = 2, nrow = m), # contact avoidance
                       ...

Fitting a multilevel HMM – emission prior

Specifying parameter values for the prior on the emission distribution is mandatory, as it helps with the label switching problem. We want to convey:

  • what are likely observations for each of the states (e.g., for a low and high crisis state)?
  • what is a likley amount of variation between individuals in the state means?
  • means and variances need to be specified such that all observations within a variable receive reasonable support.
  • take negative correlations between variables into account!
    • E.g., high self-control likely goes together with low suicidal ideation and vice versa.
  • can re-use some of the values specified in the starting values!

Fitting a multilevel HMM – emission prior

Specified using the function prior_emiss_cont():

prior_emiss_cont(
  gen,              # general model properties m and n_dep
  emiss_mu0,        # hyper-prior mean values of the Normal emission distribution 
  emiss_K0,         # number of hypothetical subjects emiss_mu0 is based on
  emiss_V,          # degrees of freedom inverse Gamma hyper-prior on BETWEEN subject variance 
  emiss_nu,         # hyper-prior variance of Inverse Gamma on BETWEEN subject variance 
  emiss_a0,         # shape values of IG hyper-prior on emission standard deviation^2 
  emiss_b0,         # rate values of IG hyper-prior on emission standard deviation^2 
)

Most abstract is probably to specify emiss_a0 and emiss_b0 who determine the IG hyper-prior on the emission standard deviation\(^2\)

More on IG hyper-prior on emission \(\sigma^2\)

Inverse Gamma hyper-prior on variance of Normal emission distribution:

Fitting a multilevel HMM – emission prior

Specified using the function prior_emiss_cont():

emiss_prior <- prior_emiss_cont(
  gen              = list(m = m, n_dep = n_dep),
  emiss_mu0,       = list(matrix(c(60, 40, 20,  3), nrow = 1), # self-control
                          matrix(c(20, 45, 65, 82), nrow = 1), # negative-affect
                          matrix(c( 3, 15, 40, 75), nrow = 1), # contact avoidance)
                          ... )
  emiss_K0         = rep(list((1)),n_dep),
  emiss_V,         = rep(list((1)),n_dep),
  emiss_nu,        = rep(list(rep(10^2, m)), n_dep),
  emiss_a0,        = rep(list(rep(1.5, m)), n_dep),
  emiss_b0,        = rep(list(rep(20, m)), n_dep))
)

Fitting a multilevel HMM

library(mHMMbayes)

# Run a model without covariate(s) and default prior for gamma:
out_4st <- mHMM(s_data          = CAB_mHMM,
                data_distr      = "continuous",,
                gen             = list(m = m, n_dep = n_dep),
                start_val       = c(list(start_tpm), start_emission),
                emiss_hyp_prior = emiss_prior,
                mcmc            = list(J = J, burn_in = burn_in))

Model output – print()

out_4st
## Number of subjects: 26 
## 
## 10000 iterations used in the MCMC algorithm with a burn in of 5000 
## Average Log likelihood over all subjects: -1256.123 
## Average AIC over all subjects:  2616.246 
## Average AICc over all subjects: 2993.319 
## 
## Number of states used: 4 
## 
## Number of dependent variables used: 5 
## 
## Type of dependent variable(s): continuous

Model output – summary()

summary(out_4st)
## State transition probability matrix 
##  (at the group level): 
##  
##              To state 1 To state 2 To state 3 To state 4
## From state 1      0.808      0.106      0.047      0.039
## From state 2      0.048      0.717      0.116      0.119
## From state 3      0.042      0.210      0.644      0.104
## From state 4      0.022      0.254      0.180      0.544
## 
##  
## Emission distribution ( continuous ) for each of the dependent variables 
##  (at the group level): 
##  
## $NegatiefAffect
##           Mean     SD
## State 1 31.648 13.321
## State 2 51.616 12.412
## State 3 59.418 14.625
## State 4 72.094 16.227
## 
## $Controle
##           Mean     SD
## State 1 43.719 14.819
## State 2 34.521 13.717
## State 3 32.868 17.035
## State 4 14.119  8.765
## 
## $Terugtrekken
##           Mean     SD
## State 1 17.333 15.906
## State 2 30.279 18.155
## State 3 39.081 25.899
## State 4 48.589 33.481
## 
## $BehoefteContact
##           Mean     SD
## State 1 41.840 17.458
## State 2 40.459 20.986
## State 3 42.033 25.729
## State 4  6.978  3.555
## 
## $Suicidaliteit
##           Mean     SD
## State 1  3.903  2.418
## State 2 48.334 16.300
## State 3 55.728 20.037
## State 4 65.546 17.589

Start practical 1




Let’s try to run our own multilevel HMM in the lab!

References

  • J. Haslbeck, O. Ryan, & F. Dablander (2023). Multimodality and skewness in emotion time series. Emotion, 23(8), 2117–2141. Doi: 10.1037/emo0001218
  • W. W. Hale III, & E. Aarts (2023). Hidden Markov model detection of interpersonal interaction dynamics in predicting patient depression improvement in psychotherapy: Proof-of-concept study. Journal of Affective Disorders Reports, 14, 100635. Doi: 10.1016/j.jadr.2023.100635
  • S. Mildiner Moraga, F.M. Bos, B. Doornbos, R. Bruggeman, L. van der Krieke, E. Snippe, E. Aarts (2022). Evidence for severe mood instability in patients with bipolar disorder: Applying multilevel hiden Markov modelling to intensive longitudinal ecological momentary assessment data. PsyArXiv preprint. Doi: 10.31234/osf.io/egp82
  • Mildiner Moraga, S., & Aarts, E. (2023). Go Multivariate: Recommendations on Bayesian Multilevel Hidden Markov Models with Categorical Data. Multivariate Behavioral Research, 1-29. Doi: 10.1080/00273171.2023.2205392

Note: specifying (weakly) informative-priors – gamma

Transition probability matrix Gamma (prior specification is optional):

prior_gamma(
  m,                  # number of states in the model
  gamma_mu0,          # hyper-prior mean values on the Multinomial logit intercepts
  gamma_K0 = NULL,    # number of hypothetical subjects gamma_mu0 is based on
  gamma_nu = NULL,    # degrees of freedom inverse Wishart hyper-prior on covariance 
  gamma_V = NULL,     # hyper-prior variance-covariance matrix of Inverse Wishart 
)


prob_to_int(
  prob_matrix         # complete transition probability matrix gamma
)

mHMMbayes - Metropolis within Gibbs sampler

  1. Obtain forward probabilities \(\alpha_{t(ni)} = Pr\big(O_1 = o_1, ..., O_t = o_t, S_t = i\big)\) for each subject using current values of subject specific model parameters.
  2. Sample hidden state sequences in backward manner using \(\alpha_{t(ni)}\) and \(\gamma_{nij}\).
  3. Given current subject-specific parameters, draw new group-level parameter estimates using Gibbs step
  4. Given
    • observed event sequence for each subject
    • sampled hidden state sequence for each subject
    • group-level parameter distributions
    • draw new subject-specific parameters using RW Metropolis sampler
  5. Repeat step 1-4 a very large number of times